Video coding method and decoding method and devices thereof

ABSTRACT

A new predictive coding is used to increase the temporal frame rate and coding efficiency without introducing excessive delay. Currently the motion vector for the blocks in the bi-directionally predicted frame is derived from the motion vector of the corresponding block in the forward predicted frame using a linear motion model. This however is not effective when the motion in the image sequence is not linear. The efficiency of this method can be further improved if a non-linear motion model is used. In this model a delta motion vector is added to or subtracted from the derived forward and backward motion vector, respectively. The encoder performs an additional search to determine if there is a need for the delta motion vector. The presence of this delta motion vector in the transmitted bitstream is signalled to the decoder which then takes the appropriate action to make use of the delta motion vector to derive the effective forward and backward motion vectors for the bi-directionally predicted block.

This application is a divisional reissue application of U.S. Pat. No.5,825,421, issued Oct. 20, 1998. This application also has a relatedreissue application Ser. No. 09/691,857, filed on Oct. 18, 2000.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention can be used in low bit rate video coding fortele-communicative applications. It improves the temporal frame rate ofthe decoder output as well as the overall picture quality.

2. Related art of the Invention

In a typical hybrid transform coding algorithm such as the ITU-TRecommendation H.261 [1] and MPEG [2] motion compensation is used toreduce the amount of temporal redundancy in the sequence. In the H.261coding scheme, the frames are coded using only forward prediction,hereafter referred to as P-frames. In the MPEG doing scheme, some framesare coded using bi-direction prediction, hereafter referred to asB-frames. B-frames improve the efficiency of the coding scheme. Now the[1] is ITU-T Recommendation H.261 (Formerly CCITT Recommendation H.261)Codes for audiovisual services at p×64 kbit/s Geneva. 1990 , and the [2]is ISO/TEC 11172-2 1993 . Information technology—Coding of movingpictures and associated audio for digital storage media at up to about1.5 Mbit/s—Part 2: Video.

However, it introduces delay in the encoding and decoding, making itunsuitable for applications in the communicative services where delay isan important parameter. FIG. 1a and 1b illustrates the frame predictionof H.261 and MPEG as described above. A new method of coding involvingthe coding of the P and B frames as a single unit, hereafter referred toas the PB-frame, was introduced. In this scheme the blocks in thePB-frames are coded and transmitted together thus reducing the totaldelay. In fact the total delay should not be more than a scheme usingforward prediction only but at half the frame rate.

FIG. 2a shows the PB-frame prediction. A PB-frame consists of twopictures being coded as one unit. The name PB comes from the name ofpicture types in MPEG where there are P-frames and B-frames. Thus aPB-frame consists of one P-frame which is predicted from the lastdecoded P-frame and one B-frame which is predicted both from the lastdecoded P-frame and the P-frame currently being decoded. This lastpicture is called B-frame because parts of it may be bi-directionallypredicted from the past and further P-frame.

FIG. 2b shows the forward and bi-directional prediction for a block inthe B-frame, hereafter refereed to as a B-block. Only the region thatoverlaps with the corresponding block in the current P-frame, hereafterreferred to as the P-block, is bi-directionally predicted. The rest ofthe B-block is forward predicted from the previous frame. Thus only theprevious frame is required in the frame store. The information from theP-frame is obtained from the P-block currently being decoded.

In the PB-block only the motion vectors for the P-block is transmittedto the decoder. The forward and backward motion vectors for the B-blockis derived from the P motion vectors. A linear motion model is used andthe temporal reference of the B and P frame is used to scale the motionvector appropriately. FIG. 3a depicts the motion vector sealing and theformula is shown below:MV_(F)=(TR_(B)×MV)/TR_(P)  (1)MV_(B)=((TR_(B)−TR_(P))×MV)/TR_(P)  (2)where

-   -   MV is the motion vector of the P-block.    -   MV_(F) and MV_(B) are the forward and backward motion vectors        for the B-block.    -   TR_(B) is the increment in the temporal reference from the last        P-frame to the current B-frame, and    -   TR_(P) is the increment in the temporal reference from the last        P-frame to the current P-frame.

Currently the method used in the prior art assumes a linear motionmodel. However this assumption is not valid in a normal scene where themotion is typically not linear. This is especially true when the camerashakes and when objects are not moving at constant velocities.

A second problem involves the quantization and transmission of theresidual of the predication error in the B-block Currently thecoefficients from the P-block and the B-block are interleaved in somescanning order which requires the B-block efficients to be transmittedeven when they are all zero. This is not very efficient as it is quiteoften that there are no residual coefficients to transmit (allcoefficients are zero).

SUMMARY OF THE INVENTION

In order to solve the first problem, the current invention employs adelta motion vector to compensate for the non-linear motion. Thus itbecomes necessary for the encoder to perform an additional motion searchto obtain the optimum delta motion vector that when added to the derivedmotion vectors would result in the best match in the prediction. Thisdelta motion vectors are transmitted to the decoder at the block levelonly when necessary. A flag is used to indicate to the decoder if thereare delta motion vectors present for the B-block.

For the second problem, this invention also uses a flag to indicate ifthere are coefficients for the B-block to be decoded.

The operation of the Invention is described as follows.

FIG. 3a shows the linear motion model used for the derivation of theforward and backward motion vectors from the P-block motion vector andthe temporal reference information. As illustrated in FIG. 3b, thismodel breaks down when the motion is not linear. The derived forward andbackward motion vector is different from the actual motion vector whenthe motion is not linear. This is especially true when objects in thescene are moving at changing velocities.

In the current invention the problem is solved by adding a small deltamotion vector to the derived motion vector to compensate for thedifference between the derived and true motion vector. Therefore theequation in (1) and (2) are now replaced by equations (3) and (4),respectively.MV_(F)′=(TR_(B)×MV)/TR_(P)+MV_(Delta)  (3)MV_(B)′=((TR_(B)=TR_(P))×MV)/TR_(P)−MV_(Delta)   (4)where

-   -   MV is the motion vector of the P-block.    -   MV_(Delta) is the delta motion vector.    -   MV_(F) and MV_(B) are the new forward and backward motion        vectors for the B-block according to the current invention.    -   TR_(B) is the increment in the temporal reference from the last        P-frame to the current B-frame. and    -   TR_(p) is the increment in the temporal reference from the last        P-frame to the current P-frame.    -   Note: Equations (3) and (4) are used for the motion vector in        the horizontal as well as the vertical directions. Thus the        motion vectors are in pairs and there are actually two        independent delta motion vectors, one each for the horizontal        and vertical directions.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1a is a prior art which illustrates the predictions mode used inthe ITU-T Recommendation H.261 Standard.

FIG. 1b is a prior art which illustrates the prediction mode used in theISO-IEC/JTC MPEG Standard.

FIG. 2a illustrates the PB-frame prediction mode.

FIG. 2b illustrates the B-block bi-directional prediction mode.

FIG. 3a illustrates the linear motion model.

FIG. 3b illustrates the non-linear motion model of the currentinvention.

FIG. 4 illustrates the encoder functionality block diagram.

FIG. 5 illustrates the B-block bi-directional prediction functionalityblock diagram.

FIG. 6 illustrates the decoder functionality block diagram.

PREFERRED EMBODIMENTS

The preferred embodiment of the current invention is described here.FIG. 4 illustrates the encoding functionality diagram. The presentinvention deals with the method for deriving the motion vectors for theB-block. The encoding functionality is presented here for completenessof the embodiment.

The encoding functionality block diagram depicts an encoder using amotion estimation and compensation for reducing the temporal redundancyin the sequence to be coded. The input sequences is organized into afirst frame and pairs of subsequent frames. The first frame, hereafterreferred to as the I-frame, is coded independent of all other frames.The pair of subsequent frames, hereafter referred to as PB-frame,consist of a B-frame followed by a P-frame. The P-frame is forwardpredicted based on the previously reconstructed I-frame or P-frame andthe B-frame is bi-directionally predicted based on the previouslyreconstructed I-frame or P-frame and the information in the currentP-frame.

The input frame image sequence, 1, is placed in the Frame Memory 2. Ifthe frame is classified as an I-frame or a P-frame it is passed throughline 14 to the Reference Memory 3, for use as the reference frame in themotion estimation of the next PB-frame to be predictively encoded. Thesignal is then passed through line 13 to the Block Sampling module 4,where it is partitioned into spatially non-overlapping blocks of pixeldata for further processing.

If the frame is classified as an I-frame, the sampled blocks are passedthrough line 16 to the DCT module 7. If the frame is classified as aPB-frame, the sampled blocks are passed through line 17 to the MotionEstimation module 5. The Motion Estimation module 5 uses informationfrom the Reference Frame Memory 3 and the current block 17 to obtain themotion vector for that provides the best match for the P-block. Themotion vector and the local reconstructed frame, 12, are passed throughline 19 and 20, respectively, to the Motion Compensation module 6. Thedifference image is formed by subtracting the motion compensated decodedframe, 21, from the current P-block 15. This signal is then passedthrough line 22 to the DCT module 7.

In the DCT module 7, each block is transformed into the DCT domaincoefficients. The transform coefficienta are passed through line 23 toQuantization module 8, where they are quantizied. The quantiziedcoefficients are then passed though line 24 to the Run-length & VariableLength Coding module 9. Here the coefficients are entropy coded to formthe Output Bit Stream 25.

If the current block is an I-block or a P-block, the quantizedcoefficients are also passed through line 26 to the Inverse Quantizationmodule 10. The output of the Inverse Quantization 10, is then passedthrough line 27 to the Inverse DCT module 11. If the current block is anI-block then the reconstructed block is placed, via line 28, in theLocal Decoded Frame Memory 12. If the current block is a P-block thenthe output of the Inverse DCT 29 is added to the motion compensatedoutput 21, to from the reconstructed block 30. The reconstructed block30, is then placed in the Local Decoded Frame Memory 12, for the motioncompensation of the subsequent frames.

After the P-block have been locally reconstructed, the information ispassed again to the Motion Compensation Module 6, where the predictionof the B-block is formed. FIG. 5 shows a more detailed functionaldiagram for the B-block prediction process. The P-motion vector derivedin the Motion Estimation module 51, is passed through line 57 to theMotion Vector Sealing Module 53. Here the forward and backward motionvectors of the B-block is derived using the formula (1) and (2),respectively. In the present embodiment, an additional motion searcharound these vectors is performed in the Delta Motion Search module 54,to obtain the delta motion vector. In this embodiment the motion vectoris obtained by performing the search for all detail motion vector valuesbetween −3 and 3. The delta motion vector value that gives the bestprediction in terms of the samples mean absolute difference in the pixelvalues of the B-block and the prediction block is chosen. The predictionis formed in the Bi-directional Motion Compensation module 55, accordingto FIG. 2b using the information from the Local Decoded Frame Memory 52,and the Current Reconstructed P-block 59. In the bi-directionalprediction, only information available in the corresponding P-block isused to predict the B-block. The average of the P-block information andthe information from the Local Decoded Frame is used to predict theB-block. The rest of the B-block is predicted using information from theLocal Decoded Frame only.

The reduction difference block is then passed through line 22 to the DCTmodule 7. The DCT coefficients are then passed through line 23 to theQuantization module 8. The result of the Quantization module 8, ispassed through line 24 to the Run-length & Variable Length Coding 9. Inthis module the presence of the delta motion vector and the quantizedresidual error in the Output Bitstream 25, is indicated a variablelength code. NOB which is the acronym for No B-block. This flag isgenerated in Run-length & Variable Length Coding module 9 based onwhether there are residual error in the Quantization module 8 and deltamotion vectors found in the Delta Motion Search module 54 is not zero.Table 1 provides the preferred embodiment of the variable length codefor the NOB flag. The variable length code of the NOB flag is insertedin the Output Bitstream, 25, prior to the delta motion vector anquantized residual error codes.

TABLE 1 (Variable length code for the NOB flag) Quantized Residual DeltaMotion NOB Error Coded Vectors Coded 0 No No 10 No Yes 110 Yes No 111Yes Yes

FIG. 6 shows the functional block diagram for the decoder. The Input BitStream 31, is passed to the Variable Length & Run Length Decoding module32. The block and side information are extracted in this module. If theframe is a PB-frame then the bitstream is checked if any delta motionvector and/or quantized residual error coefficients present. The outputof the module 32is passed through line 37 to the Inverse Quantizationmodule 33. The output of the Inverse Quantization 33, is then passedthrough line 38 to the Inverse DCT module 34. Here the coefficients aretransformed back into the pixel values.

If the current frame is an I-frame then the output of Inverse DCT 34, ispassed through line 39 and stored in the Frame Memory 42.

If the current frame is a PB-frame, the side information containing themotion vector are passe through line 45 to the Motion compensationmodule 36. The motion Compensation module 36, uses this information andthe information in the Local Decoded Memory, 35, to from the motioncompensated signal, 44. This signal is then added to the output of theInverse DCT module 34, to form the reconstruction of the P-block.

The Motion Compensation module 36, then uses the additional informationobtained in the reconstructed P-block to obtain the bi-directionalprediction for the B-block. The B-block is then reconstructed and placedin the Frame Memory, 42, together with the P-block.

By implementing this invention, the temporal frame rate of the decodedsequences can be effectively doubled at a fraction of the expected costin bit rate. The delay is similar to that of the same sequence decodedat half the frame rate.

As described above in the present invention a new predictive coding isused to increase the temporal frame rate and coding efficiency withoutintroducing excessive delay. Currently the motion vector for the blocksin the bi-directionally predicted frame is derived from the motionvector of the corresponding block in the forward predicted frame using alinear motion model. This however is not effective when the motion inthe image sequence is not linear. According to this invention, theefficiency of this method can be further improved if a non-linear motionmodel is used. In this model a delta motion vector is added to orsubtracted from the derived forward and backward motion vector,respectively. The encoder performs an additional search to determine ifthere is a need for the delta motion vector. The presence of this deltamotion vector in the transmitted bitstream is sinaglled to the decoderwhich then takes the appropriate action to make use of the delta motionvector to derive the effective forward and backward motion vectors forthe bi-directionally predicted block.

1. A method for encoding a sequence of video image frames comprising thesteps of: dividing a source sequence into a set of group of pictures,each group of pictures comprising a first frame (I-frame) followed by aplurality of pairs of predictively encoded frames (PB-frame pairs), eachPB-frame pair having a corresponding P-block; dividing each I-frame orPB-frame pair into a plurality of spatially non-overlapping blocks ofpixel data; encoding the blocks from the I-frame (I-blocks)independently from any other frames in the group of pictures;predictively encoding the blocks from the second frame of the PB-framepair (P-blocks), based on the I-blocks in the previous I-frame or theP-blocks in the previous PB-frame pair; bi-directionally predictivelyencoding the blocks from the first frame of the PB-frame pair(B-blocks), based on the I-blocks in the previous I-frame or theP-blocks in the previous PB-frame pair and the corresponding P-block inthe current PB-frame pair; deriving a sealed forward motion vector and asealed backward motion vector of the B-block by sealing the motionvector of the corresponding P-block in the current PB-frame pair;obtaining a final forward motion vector for the B-block by adding adelta motion vector on the sealed forward motion vector; and obtaining afinal backward motion vector for the B-block by subtracting the deltamotion vector from the sealed backward motion vector.
 2. A method forencoding a sequence of video image frames according to claim 1, whereinthe sealing of the motion vector is based on a temporal reference of thefirst and second frames of the PB-frame pair.
 3. A method for encoding asequence of video image frames according to claim 1, further comprisingthe step of forming an encoded output, wherein the encoded output is abitstream comprising: temporal reference information for the first andsecond frames of the PB-frame pairs; motion vector information for theP-blocks; quantized residual error information for the P-blocks; deltamotion vector information for the B-blocks; and quantized residual errorinformation for the B-blocks.
 4. A method for encoding a sequence ofvideo image frames according to claim 3, wherein the output bitstreamcontains additional information to indicate the presence of at least oneof: the delta motion vector information for the B-blocks; and thequantized residual error information for the B-blocks.
 5. A method fordecoding a sequence of video image frames comprising the steps of:decoding the compressed video image sequence as a set of group ofpictures, each group of pictures comprising an I-frame followed by aplurality of PB-frame pairs, each PB-frame pair having a correspondingP-block; decoding each I-frame or PB-frame pair into a plurality ofspatially non-overlapping blocks of pixel data; decoding the I-blocksfrom the I-frame independently from any other frames in the group ofpictures; predictively decoding the P-block from the second frame of thePB-frame pair based on the I-blocks in the previous I-frame or theP-blocks in the previous PB-frame pair; bi-directionally predictivelydecoding the B-blocks from the first frame of the PB-frame pair based onthe I-blocks in the previous I-frame or the P-blocks in the previousPB-frame pair and the corresponding P-block in the current PB-framepair; driving a sealed forward motion vector and a sealed backwardmotion vector for the B-block by sealing the motion vector of thecorresponding P-block in the current PB-frame pair; obtaining a finalforward motion vector for the B-block by adding a delta motion vector tothe sealed forward motion vector; and obtaining a final backward motionvector for the B-block by subtracting the delta motion vector from thesealed backward motion vector.
 6. A method for decoding a sequence ofvideo image frames according to claim 5, further comprising the step offorming a decoded output, wherein the decoded output is responsive to abitstream comprising: temporal reference information for the first andsecond frames of the PB-frame pairs; motion vector information for theP-blocks; quantized residual error information for the P-blocks; thedelta motion vector information for the B-blocks; and quantized residualerror information for the B-blocks.
 7. A method for decoding a sequenceof video image frames according to claim 6, wherein the bitstreamcontains additional information to indicate the presence of at least oneof: the delta motion vector information for the B-blocks; and thequantized residual error information for the B-block.
 8. A method ofdecoding a sequence of video image frames according to claim 5, whereinthe sealing is based on a temporal reference of the first and secondframes of the PB-frame pair.
 9. An apparatus for encoding a sequence ofvideo image frames comprising: means for encoding each frame in asequence of video image frames into a set of group of pictures, eachgroup of pictures comprising an I-frame followed by a plurality ofPB-frame pairs; means for dividing the I-frame and the PB-frame pairinto a plurality of spatially non-overlapping blocks of pixel data;means for encoding and decoding the I-blocks of the I-frameindependently from any other frames in the group of pictures; means forstoring the decoded I-blocks to predictively encode subsequent frames;means for predictively encoding and decoding the P-blocks of the secondframe of the PB-frame pair based on the I-blocks in the previous I-frameor the P-blocks in the previous PB-frame pair; means for storing thedecoded P-block to predictively encode subsequent frames; means ofderiving a sealed forward motion vector and a sealed backward motionvector for a B-block by sealing the motion vector of the correspondingP-block in the current PB-frame pair, the B-block being the first frameof the PB-frame pair; means for obtaining a final forward motion vectorfor the B-block by adding a delta motion vector to the sealed forwardmotion vector; means for obtaining a final backward motion vector forthe B-block by subtracting the same delta motion vector from the sealedbackward motion vector; and means for encoding the B-blocks of the firstframe of the PB-frame pairs based on the I-blocks in the previousI-frame or the P-blocks in the previous PB-frame pair and thecorresponding P-block in the current PB-frame pair using the finalforward motion vector and the final backward motion vector.
 10. Anapparatus for decoding a sequence of video image frames comprising:means for decoding each frame in a sequence of video image frames into aset of group of pictures, each group of pictures composing an I-framefollowed by a plurality of PB-frame pairs; means for decoding theI-blocks of the I-frame independently of any other frames in the groupof pictures; means for storing the decoded I-blocks to predictivelydecode subsequent frames; means for decoding the P-blocks of the secondframe of the PB-frame pair based on the I-blocks in the previous I-frameor the P-blocks in the previous PB-frame pair; means for storing thedecoded P-blocks to predictively decode subsequent frames; means forderiving a sealed forward motion vector and a sealed backward motionvector for a B-block by sealing the motion vector of the correspondingP-block in the current PB-frame pair, the B-block being the first frameof the PB-frame pair; means for obtaining final forward motion vectorfor the B-block by adding a delta motion vector to the sealed forwardmotion vector; means for obtaining a final backward motion vector forthe B-block by subtracting the delta motion vector to the sealedbackward motion vector; and means for decoding the B-blocks of the firstframe of the PB-frame pairs based on the I-blocks in the previousI-frame of the P-blocks in the previous PB-frame pair and thecorresponding P-block in the current PB-frame pair using the finalforward motion vector and the final backward motion vector.
 11. A methodfor encoding a sequence of video image frames comprising the steps of:dividing a source sequence into a plurality of groups of pictures, eachgroup of pictures comprising a first frame (I-frame) followed by aplurality of pairs of predictively encoded frames (PB-frame pairs);dividing each I-frame or PB-frame pair into a plurality of blocks;encoding the blocks from the I-frame; predictively encoding the blocksfrom the second frame of the PB-frame pair; bi-directionallypredictively encoding the blocks from the first frame of a PB-frame pair(B-blocks); deriving a sealed forward motion vector and a sealedbackward motion vector for the B-block; obtaining a final forward motionvector for the B-block by adding a delta motion vector to the sealedforward motion vector; and obtaining a final backward motion vector forthe B-block by subtracting the delta motion vector from the sealedbackward motion vector.
 12. An apparatus for encoding a sequence ofvideo image frames comprising: means for dividing a source sequence intoa plurality of groups of pictures, each group of pictures comprising afirst frame (I-frame) followed by a plurality of pairs of predictivelyencoded frames (PB-frame pairs); means for dividing each I-frame orPB-frame pair into a plurality of blocks; means for encoding the blocksfrom the I-frame; means for predictively encoding the blocks from thesecond frame of the PB-frame pair; means for bi-directionallypredictively encoding the blocks from the first frame of a PB-frame pair(B-blocks); means for deriving a sealed forward motion vector and asealed backward motion vector for the B-block; means for obtaining afinal forward motion vector for the B-block by adding a delta motionvector to the sealed forward motion vector; and means for obtaining afinal backward motion vector for the B-block by subtracting the deltamotion vector from the sealed backward motion vector.
 13. A method fordecoding a compressed video image sequence of a group of picturesincluding an I-frame followed by a plurality of P-frames and B-frames,comprising the steps of: decoding a block in the I-frame independentlyfrom any other frames in the group of pictures; predictively decoding ablock in a P-frame based on the previous I-frame or a previous P-frame;bi-directionally predictively decoding a block in a B-frame based on theprevious I-frame or a previous P-frame and a block in a P-framepositioned after the B-frame; deriving a scaled forward motion vectorand a scaled backward motion vector for the block in the B-frame byscaling a motion vector of the block in the P-frame positioned after theB-frame; obtaining a final forward motion vector for the block in theB-frame by adding a delta motion vector to the scaled forward motionvector; and obtaining a final backward motion vector for the block inthe B-frame by adding the delta motion vector to the scaled backwardmotion vector.
 14. A method of decoding a sequence of video image framesaccording to claim 13, wherein the deriving step includes: scaling ofthe forward and backward motion vectors is based on a temporal referenceof the B-frame and the P-frame.
 15. A method for decoding a sequence ofvideo image frames according to claim 13, further comprising the step offorming a decoded output, wherein the decoded output is responsive to abitstream comprising: temporal reference information for the B-frame andthe P-frame; motion vector information for the block in the P-frame;quantized residual error information for the block in the P-frame; thedelta motion vector information for the block in the B-frame; andquantized residual error information for the block in the B-frame.
 16. Amethod for decoding a sequence of video image frames according to claim15, wherein the bitstream contains additional information indicating apresence of at least one of the delta motion vector information for theblock in the B-frame; and the quantized residual error information forthe block in the B-frame.